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Abstract — We propose and analyze a novel way for increasing 
the area spectral efficiency of video transmission in cellular 
communications systems. Small base stations called helpers with 
low-rate backhaul but high storage capacity cache popular video 
files. Mobile stations requesting video files receive them via short- 
range transmissions from helpers, if they are present in their 
caches; files not available from helpers are transmitted by the 
base station. We analyze the optimum way of assigning files 
to the different helpers, in order to minimize the expected file 
downloading time for files. We distinguish between the uncoded 
case (where only complete files are stored) and the coded case, 
where segments of Fountain-encoded versions of the video files 
are stored at helpers. We show that for the uncoded files, 
the optimum file assignment is NP-hard, and develop a greedy 
strategy that is provably within a factor 2 of the optimum. We 
furthermore show that for the coded case, the optimum file 
assignment can be formulated as a convex problem that can 
be further reduced to a linear program. Numerical simulations 
based on measured traces of YouTube requests demonstrate that 
our approach can improve video throughput by a factor 3 — 5 
in realistic settings. 

I. Introduction 

Video is the main reason for the dramatic growth of data 
traffic over cellular communications networks - an increase 
by two orders of magnitude compared to current volume is 
expected in the next five years |1|. While this constitutes 
a great business opportunity, there is the danger that this 
increased demand will lead to a slowdown, or even breakdown, 
of cellular systems, frustrating customers, and essentially 
jeopardizing the success of more and more advanced wireless 
data services. Therefore, the tremendous increase in mobile 
video content demand will require a corresponding increase of 
area spectral efficiency of wireless networks. The most promis- 
ing approach to achieve this is shrinking the cell sizes and 
essentially bringing the content closer to the users. This can 
be implemented by deploying small base stations that achieve 
localized communication and enable high-density spatial reuse 
of communication resources |2|. Such pico- and femto-cell 
networks, which are usually combined with macrocells into 
a heterogeneous network, are receiving a lot of attention in 
the recent literature, (see e.g. |3| and references therein). 
However, the requirement for high-speed backhaul to the 
cellular operator network makes this an expensive option 

In this paper we suggest a radically new way to deal with the 
backhaul bottleneck problem. Our main idea is is to provide 
the femto-base stations, typically served by rate-limited DSL 
backhaul links, with large storage capacity. In addition, we 



may also think of deploying fixed wireless caching nodes 
with no backhaul link at all, beyond the standard wireless 
connection to the cellular network. These stations, which we 
henceforth call caching helpers, or simply helpers, form a 
wireless distributed caching infrastructure. 

This novel architecture is based on the following obser- 
vations: (i) most video traffic is generated by encoded and 
pre- stored video in large video on demand servers such as 
Hulu, YouTube and Netflix, rather than live streaming, which 
amounts for a relatively tiny fraction of the overall video traf- 
fic. A large portion of the video on demand traffic corresponds 
to a few popular files; (ii) disk storage is a quantity that 
increases faster than brute-force increase of spatial reuse (with 
the associated increase of backhaul capacity requirement). 
Based on these seemingly simple ideas, we propose a system 
architecture that achieves the required throughput at consid- 
erably lower cost. The key insight is that if there is enough 
content reuse, i.e. many users are requesting the same few 
video files, caching can replace backhaul communication. The 
most popular files are stored in the cache, and are therefore 
always available locally to the UTs (User Terminal) that are 
requesting them. 

More concretely, the helpers are placed in fixed positions in 
the cell and are assumed to have (i) large storage capacity, (ii) 
localized, high-bandwidth wireless communication capabilities 
which enable high frequency reuse, and (iii) low-rate backhaul 
links which can be wired or wireless. They can cache popular 
files and serve requests from mobile User Terminals (UTs) 
by enabling localized communication. Our approach is thus 
fundamentally different from a heterogeneous network using 
femto base stations, which do not have caches, and thus need 
to obtain any file through their backhaul network when it has 
been requested locally. 

The helpers are working in conjunction with conventional 
macrocellular base stations, which provide to the UTs the 
video files that cannot be obtained from the helpers. Clearly, 
the smaller the percentage of file requests that has to be 
fulfilled by the macrocell, the larger the number of UTs that 
can be served. We will show later on in this paper that, with 
realistic setups, the number of users that can be served is 
increased by as much as 300 — 500%. 

The key question for such a system is the wireless dis- 
tributed caching problem, i.e., which files should be cached 
by which helpers. If every mobile device has only access 
to exactly one helper, then clearly each helper should cache 
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the same files, namely the most popular ones. However, for 
the case of dense helper deployment, each mobile device can 
access multiple caches and sees a distributed cache that is the 
union of the helpers' caches. As we show later, in this case, 
the assignment of files to helpers becomes highly nontrivial. 

We consider two versions of the distributed caching prob- 
lem. 

• In the first case, only complete files are stored in the 
helpers caches directly. We refer to this case as the 
uncoded distributed caching problem and we will show 
that finding an optimal placement is computationally 
intractable (NP-complete). Clearly, coding is used at the 
physical layer to achieve reliable communication. The 
uncoded nature of this problem consists of the fact that 
the raw data packets are directly stored in the caches. 

• In the second case, we allow coding. For simplicity, we 
assume that it is implemented by an ideal MDS rateless 
code. An MDS rateless code generates an arbitrarily 
long sequence of parity symbols from an information 
packet of fixed size of B symbols, such that if the 
decoder obtains any B partity symbols, it can recover 
the original B information symbols. It is well-known 
that this behavior can be closely approached (at the cost 
of a small redundancy overhead) by Raptor codes ||4|. 
Hence, in this case the individual identity of the packets 
is not relevant. Rather, what matters is how many parity 
symbols of a given file are retrieved from the helpers 
within the reach of each user. We will show that the coded 
distributed caching problem is a convex optimization 
problem that can be solved efficiently. By introducing 
additional auxiliary variables we can further reduce it to 
a linear program. 

The contribution of this work is thus four-fold: 

• Femtocaching: We propose a novel network architecture 
for video delivery that replaces (expensive) high-rate 
backhauls with (cheap) storage units combined with low- 
rate backhauls. We formalize the wireless distributed 
caching optimization problem, i.e., the question of which 
files should be assigned to which helpers. While there 
is a substantial amount of prior work on caching al- 
gorithms for web and video content (e.g., |5|, ||6|, ||7| 
and references therein), to the best of knowledge, all this 
related work focuses on wired and p2p networks and has 
a different focus and constraints, compared to the wireless 
problem we investigate here. 

• Uncoded Femtocaching: The uncoded distributed caching 
problem is a special covering problem that involves plac- 
ing files in helpers to minimize the total average delay of 
all UTs or equivalently maximize the weighted distributed 
caching that all the users see jointly. We show that 
finding the optimum placement of files is NP-complete. 
Further, we express the problem as a maximization of 
a submodular function subject to matroid constraints 
(see also |[8| for other applications of matroid theory in 
networks). 

• Coded Femtocaching: For coded distributed caching 
problem, we show that the expected delay minimization 
problem can be formulated as a convex optimization 



problem. 

• Performance evaluation: We present a detailed simulation 
of a university campus scenario covered by a single 
macro-BS (using the cellular standard 3GPP LTE R8) and 
several helpers using a simplified 802.1 In protocol. File 
requests are made according to real-world user request 
based on traces of YouTube videos from the Amherst 
campus ||9| |TQ| . Our simulations show that there are 
very significant gains even when very simple caching 
algorithms are used. 
The remainder of the paper is organized as follows: Section 
II presents the system model and assumptions. In Section III 
we formulate the uncoded distributed caching problem and 
present the solution for it. Section IV discuses the coded 
distributed caching problem. In Section V we verify our 
architecture by simulation results which are based on the 
experimental YouTube requests. A summary concludes the 
paper in Sec. VI. 

II. Distributed caching placement model and 

ASSUMPTIONS 

We consider a single cell, equipped with a base station 
(BS), serving a large number of User Terminals (UTs), with 
the help of dedicated content distribution nodes, or helpers. 
The helpers are placed in fixed positions in the cell and 
are assumed to have (i) large storage capacity, (ii) localized, 
high-bandwidth communication capabilities which enable high 
frequency reuse, and (iii) low- rate backhaul links which can 
be wired or wireless; they thus form a distributed caching 
infrastructure. Figure [T] illustrates the system layout. The key 
point is that if there is enough content reuse, i.e. many users 
are requesting the same video content, caching can replace 
backhaul communication. 

The helpers are working in conjunction with conventional 
macrocellular base stations. If a UT requests a file that 
is cached in local helpers, the helpers handle the request; 
otherwise, the BS should serve the request. Clearly, the smaller 
the percentage of video requests that has to be fulfilled by the 
macrocell, the larger the number of UTs that can be served. 




Fig. 1. An example of the single-cell layout. UTs are randomly distributed, 
while helpers can be optimally placed in the coverage region. 

Here, we formulate the following wireless distributed 
caching problem: for given popularity distribution, storage 
capacity in the helpers and wireless communication model 
for the BS to UTs downlink and helpers to UTs links, how 
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should the files be placed in the helpers such that the average 
sum delay of all users is minimized? Since users experience 
shorter delay when they locally download from helpers in 
their neighborhoods instead of the BS, minimizing the av- 
erage delay for a given user is equivalent to maximizing the 
probability of finding the desired content in the helpers within 
the UT's reach. The solution is trivial when there are few 
helpers in the cell and as a result each UT can connect only 
to a single helper. In this case, each helper should cache the 
most popular files, in sequence of popularity, until its cache 
is full. If the helper deployment is dense enough, UTs will be 
able to communicate with several such helpers and each sees a 
distributed cache that is the union of the helpers caches. In this 
situation, the question on how to best assign files to different 
helpers becomes a much more complicated issue, because each 
UT sees a different, but correlated, distributed cache. In section 
[In| we illustrate through examples the complexity of finding 
the optimum file placement when some users are connected 
to more than one helper. Moreover, we show that the uncoded 
distributed caching problem is NP-complete. 

We assume that the popularity distribution of the files 
changes slowly. Typical examples include popular news, con- 
taining short videos, which are updated every 2-3 hours, new 
movies, which are posted every week, new music videos, 
which are posted (or change popularity) about every month. 
Due to time-scale decomposition, the popularity distribution of 
the files is effectively fixed; furthermore it can be learned by 
the system, and thus be assumed known for our further consid- 
erations. Moreover, the cost of refreshing the helpers' content 
can be safely neglected. Once the optimal content placement 
is determined, the BS centrally populates the helpers' caches 
using weak backhaul links. 

There are H helpers, K user terminals, and a library of 
files, denoted by J^. All files are assumed to have the 
same size. This assumption is mainly used for notational 
convenience, and could be easily lifted by considering a finer 
packetization, and breaking longer files into blocks of the same 
length. The popularity distribution of the files conditioned on 
the event that a user makes a request is denoted by P^, for 
n = 1,...,A/'. The connectivity between users and helpers 
can be represented in a bipartite graph; one example is shown 
in Figure [2] If there is an edge between helper h and UT k, it 
means that UT k can communicate reliably with helper h. In 
practice, the connectivity graph is determined by the location 
of users and helpers and transmission radius of the helpers. 
We assume that the connectivity between UTs and helpers 
does not change during the transmission of a video file. This 
requires our users to be fairly static compared to the download 
time. 

When a user requests some file n, it first asks its local 
helpers, i.e., helpers in the neighborhood of the user in the 
user-helper connectivity graph. We assume that, at the cost 
of a small protocol overhead, the BS maintains a list of all 
the helpers caches content. A user makes a request, and it is 
redirected to a local helper if the request is available there, or 
it is handled by the BS directly if it is not available. 




Fig. 2. Example of a connectivity bipartite graph indicating how UTs are 
connected to helpers. 

III. Uncoded distributed caching placement 

In the uncoded case, a file can be either entirely cached 
or not cached at all. As mentioned in Sec. I, if each UT can 
communicate to only one helper, the optimal caching policy 
is simple: each helper should cache the most popular files. 
When a user has connection to multiple helpers, however, 
the caching policy becomes non-trivial, as shown in the 
example of Figure |3] There are two helpers and four UTs. 
The dashed circles centered around helpers indicate the helper 
transmission radius. Assuming that each helper can cache M 
files, users Ui and U2 would prefer helper Hi to cache the M 
most popular files since this minimizes their expected delay. 
Similarly, user U4, would prefer that helper H2 also caches the 
M most popular files. However U^, would prefer Hi to cache 
the M most popular files and H2 the second M most popular 
(or the opposite). This effectively creates a distributed cache 
of size 2M for user U^,. As can be seen, in the distributed 
caching problem, the individual objectives of different users 
may be in conflict. 

/" © /\ \ 
! ( © j © i 

Fig. 3. Distributed Caching example: two helpers if 1, if 2 and four users 
with conflicting interests. 

Assume that the set of available transmitters for user 
k is equal to M{k) = {h\,h^, . . . ,h^j^^j^^^_-^,BS}. 
The reciprocal of the average data rates for the link 
between user k and its local helper h (including BS 
as the "helper" number h = \M{k)\) are denoted by 
Vtk = M,cjt....,cj^^^^^|_-L,a;^^(^)|}. As a matter of 
fact, the instantaneous rate of these links fluctuates because 
of fading and multiple-access scheduling, so that the 
instantaneous rate might be zero on some slots. Nevertheless, 
assuming that each user is a "drop in the ocean", i.e., that 
the mutual influence of the users on each other is negligible, 
we may consider each link as a channel of variable capacity, 
and given average rate that depends only on the link itself. 
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i.e., basically on the distance between the helper and the 
user. At this point, assuming that the file size is large 
enough such that long-term time averages are meaningful, the 
experienced delay for user k for downloading B bits from 
helper h is given by B x uj^ where the uj coefficients have 
the dimension of sec/bit, i.e., they are reciprocal of rates. 
Regarding cj^^, it can be observed and shown analytically 
that under the standard proportional fairness downlink for 
high-speed data HSDPA in 3GPP, Ev-DO in 3GPP2, and 
LTE R8 1 2 1, the delay of downloading from the BS depends 
on the overall number of active users in the system, on 
the individual user position in the cell and it is linear with 
the file size fTTl. To experience lower delay, users prefer 
to get files from transmitters that are closer to them. So, 
the average delay normalized to the file size for user k will be: 

where Ch is the files in the cache of helper h. Considering 
the finite (albeit large) storage capacity of helpers, we wish 
to find the optimal placement of the N files on the helpers' 
caches in order to minimize the total average delay of all 
UTs., i.e., Dk. We can write the optimization problem as 
follows: 

K 

maximize Y^^%s ~ 

k=i 

subject to \Ch\< M, M h, (2) 

where the optimization is with respect to the sets {Ch ^ - 
V/i}, M is the cache capacity of each helper and \Ch\ is the 
cardinality of set Ch- The constraint indicates that no more 
than M files are allowed to be placed in each helper's cache. 

In the following, we will show that the above problem is NP- 
complete. Then, we formulate the problem as maximization 
of a monotone submodular function over matroid constraints. 
Moreover, we present an algorithm that gives results that are 
provably within a constant factor from optimality. 

1 ) The uncoded distributed caching placement problem and 
computational intractability: To show that the optimization 
problem in ^ is NP-complete, we consider a special case of 
the problem such that all reciprocal rates between users and 
helpers are zero. In this case, the optimization problem in (|2]) 
can be written as: 

K 

maximize E^b5 ^ Pn 
subject to \Ch\< M, \/ h 

^^^yjWm-^Ch^ \fk, (3) 



where Ak is the union of the helpers' caches in the neigh- 
bourhood of user k in the connectivity graph of the network. 
The above objective function can be interpreted as the sum 
of values seen by each user. The value of each user k is 
equal to ^BS^neAk which is proportional to the total 
popularity values of the union of files seen by user k through 
its neighborhood helpers. Thus, the more popular files a user 
k finds in its local helpers (in the set Aj^) the larger value it 
sees. Our goal here is to maximize the sum of values seen by 
all users. To prove that the problem is NP-hard, we consider 
its corresponding decision problem, called Helper Decision 
Problem. 

Problem 1: (Helper Decision Problem) Let's denote the 
connectivity graph in Figure [2] by bipartite graph G = {U^H, 
E) where U and 1-L denote the sets of UTs and helpers, 
respectively, and E denotes the set of edges between elements 
(nodes) of U and H. We ask the following question: given 
the graph G = {U^H^ E) and a library of files and a real 
number Q > 0, does there exist any way of placing elements 
of T in the nodes of such that |G/,| < M for all /i G H 
and the objective function in ^ satisfies: 

K 

Y,^%S Y Pn>Q. (4) 

k=l n^Ak 

where A^ is the set of accessible files by UT k e U and 
is defined in The set of coefficients for all UTs 

k e U is denoted by ^bs- Files belonging to the library T 
are denoted by fi, f2, fAr and each file has popularity 
Pn. The popularity distribution for all files in the library J" 
is denoted by P. Let the problem instance be denoted by 
HLP{U,n,E,J^,I>,nBS.M,Q). 

It is easy to see that HLP{U,n, E, T.P .Qbs^ M,Q) is 
in the class NP. To show NP-hardness we will use a reduction 
from the following NP-complete problem. 

Problem 2: (2-Disjoint Set Cover Problem) Consider a bi- 
partite graph G = {A^B^E) with edges E between two 
disjoint vertex sets A and B. For every B G S, define a subset 

A/'(B) (neighborhood of B), clearly ^ = IJ A/'(B). Do there 

BeB 

exist two disjoint sets Si, S2 C B such that + IS2I = \B\ 
and ^ = U A/'(B) = IJ A/'(B). Let us denote the 

problem instance by 2DSC{A, S, E). 

The above problem for finding the 2-disjoint set cover is NP- 
complete |12|. We show in the following lemma that given a 
unit time oracle for Helper Decision Problem we can solve 
the 2- Disjoint Set Cover Problem in polynomial time (a 
polynomial time reduction is denoted by <l). 

Lemma 1: 2-Disjoint Set Cover Problem <l Helper Deci- 
sion Problem 

Proof: Consider an oracle that can solve any prob- 
lem instance HLP{...) in unit time. Then solving an 
instance of 2DSC{A^B^ E) is equivalent to solving 
i7LP(AS,^,{fi,f2},{^,^},{l,l,...,l},l,i^) where 
1^1 = and e < 1. 

Consider A to be the set of UTs, B to be the set of helpers 
and E to be the edges representing connections between UTs 
and helpers. There are only two files fi and f2 with popularity 
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Pi = and P2 = j^. The user dependent coefficients 
Qbs are assumed to be 1 for all UTs and the cache capacity 
of each helper is equal to 1. We check if the sum of values 
seen by all users can be greater than or equal to K. Since all 
the helpers have capacity 1 they can either cache file fi or 
f2. If the utility function in ^ is greater than or equal to K, 
then it has to be equal to K because every user can at most see 
1. This can only happen if the helpers caching fi and helpers 
caching f2 (both being disjoint since every helper can have 
either file only) cover the entire set of users A. This means 
there exist 2 disjoint set covers. An illustration is provided in 
Figure |4] 

Conversely, if two disjoint set covers exist, then one can 
assign fi to the first set cover and assign f2 to the second set 
cover and the HLP instance will be satisfied since the sum 
of value seen by all users will he K. ■ 



Set Cover 1 Set Cover 2 




Fig. 4. Figure illustrating the reduction from 2-Disjoint Set Cover Problem. 

2) Expressing the placement as maximizing a monotone 
submodular function over matroid constraints: In this part, 
we show that our constraints and our objective function 
are independent sets of a matroid and submodular function, 
respectively. The definition of the matroids and submodular 
functions are given in the appendix. First, we define a ground 
set S. Denoting the placing of file n to the cache of helper h 
by the element f^, the ground set is 

The ground set can be partitioned into H disjoint sets, 
Si,...,Sh where Sh = {f^ , fs^, f^} is the set of all files 
that might be placed in the cache of helper h. 

Lemma 2: The constraints in ^ can be written as partition 
matroid on the ground set S defined in (pi. 

Proof: In the optimization problem ^2]), we want to find 
the optimum way of placing files in the helpers' caches. Every 
way of content placement can be expressed by a set X C 5, 
called the placement set. So, for example if G X, the file 
n is placed in the cache of helper /i. A set of elements which 
are placed in the cache of helper h are equal to = X fl 6*^ 
where Sh is a subset of the ground set S associated to the 
helper h. So, the constraints on the cache capacity of helpers 
can be expressed as X C X where 

X = {X^S:\X{^Sh\<M, \/h = l,...,H}. (6) 

Comparing X in ([6]) and the definition of the partition matroid 
in ( [2Q| ), we can see that our constraints form a partition matroid 



with / = H and /c^ = M for z = 1, i^. The partition matroid 
is denoted hy M = {S,X). ■ 
Having proven that the constraints form a matroid, we can 
rewrite the average delay of user /c as a function of placement 
set X. 

n n 

+ ool PniX^, (n) A Xl, (n) A X^, (n)) + . . . 

n 

+ a;|^^P„(X,....AX,.^^^^_^^), (7) 

n 

where A is the 'AND' operation. X^ is the boolean represen- 
tation of Xh = XnSh and X^ is the complement of X^. The 
nth element of X^ is denoted by X^(n) which means that if 
the element is in the set X^, the nth element of vector X^ 
is 1; otherwise, it is 0. 

Lemma 3: The objective function in the optimization prob- 
lem in ([2]) is a monotone submodular function. 

Proof: We first prove that the objective function is a 
submodular function. Since the sum of submodular functions 
is submodular, it is enough to prove that for a user k the set 
function = uj^^ — is a submodular function. We show 
that the marginal value of adding a new file to an arbitrary 
helper decreases as the placement set becomes larger. The 
marginal value of adding a new element to a placement set 
X is the amount of increase in the set function due to the 
addition. 

Let's consider two placement sets Xi and X2 where Xi C 
X2 C 5*. We add an element fn' G 5'\X2 to both placement 
sets. In other words, we add file n to helper h^. To prove that 
set function Fk is submodular, we should distinguish between 
two cases. In the first case, according to the placement set 
X2, user k gets file from helper where j < i. Then, the 

marginal value of element fn' is zero. For the placement set 
Xi user k downloads file from helper hj, where > j. If 
/ < i, again the marginal value will be zero. However, when 
/ > by adding file n to helper h^, user k downloads the file 
from the nearest helper h^. So the marginal value of element 

fn' will be -PnUjf + PnCJ^/ > 0. 

In the second case, according to placement set X2, user k 
accesses file n through helper where j > i. So the marginal 

value of adding element fn' is —Pn^i + Pn'^j- Since in the 
placement set Xi, user k downloads the file from helper h^, 
where / > j, the marginal value for the placement set Xi 

which is equal to fn^ is —Pn^^ + Pn^^' is equal and greater 
than the marginal value for the larger placement set X2. 

■ 

A greedy algorithm is a quite common way of maximizing 
a submodular function subject to a matroid constraint. The 
greedy algorithm starts with an empty set; at each step, it adds 
one element with the highest marginal value to the set while 
maintaining independence of the solution. The precise greedy 
algorithm is stated below. Classical results on approximations 
of submodular functions |13| established that the greedy 
algorithm achieves a performance that is provably within a 
factor ^ of the optimal value. 
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TABLE I 

The greedy algorithm 



algorithm 


Greedy 


Xq ^ Xq+ 


Initialize = for h=l,. 


..,H 


D = S 




X ^ X + fi 


Xh^^ for h=l, 


..,H 


D ^ D\ff 


X ^ 




Dp ^ Dp\fi 


For i = 1,2, ...,M X H 




lf\Xp\ = M 


ff = argmaxdei^ fx{d) 




D ^ D\D(3 


If /x(ff) = 




Endlf 


break 




EndFor 


Endlf 




Ouput X 



In table I, with some abuse of notation, /(.) is the objective 
function in 0. As defined in ([22]), fx{d) = f{X^d)-f{X) 
is the marginal value of an element d with respect to a place- 
ment set X. At every iteration, the greedy algorithm selects 
the element ff with the highest marginal value. Selecting the 
element ff means that the file a is cached by the helper (3. 
It can be seen that the marginal value of any element with 
respect to any placement set is greater than or equal to zero. 
Moreover, because the objective function is submodular, the 
marginal value of elements decreases as we add more elements 
to the placement set. Thus, if at one iteration, the marginal 
value of a selected element is zero, the marginal value of the 
next selected elements would be zero. Hence, provided that the 
marginal value of the selected element is zero at one iteration, 
the algorithm should stop. The selected element is chosen from 
all elements in set D. The initial value of set D is the ground 
set S. At every iteration, we remove the selected element 
from the set D. If the cache of a helper (3 becomes full, we 
remove the entire set I)/^, the subset of D whose elements are 
all associated to helper /3, from the set D. So, elements of 
the helper (3 will not be considered in subsequent iterations 
for selecting the element with the highest marginal value. The 
placement set X and a set of elements cached by each helper 
h, i.e., Xh, are also updated in each iteration. At the end, the 
greedy algorithm returns the placement set X. 

For maximization of a monotone submodular function sub- 
ject to matroid constraints, a randomized algorithm which 
gives a (1 — 1/e) -approximation has been proposed in |14| . 
This algorithm has two parts. In the first part, the combinatorial 
problem is replaced with the continuous one and the approx- 
imate solution of the continuous problem is found. In the 
second part, the fractional solution of the continuous problem 
is rounded using a technique called Pipage rounding p4| . 
Although this algorithm gives a better performance guarantee, 
when the size of the ground set, equal to x in our problem, 
becomes large, it becomes too computationally demanding to 
implement. 

IV. Coded distributed caching placement 

In this section, we want to find the optimum way of placing 
fountain/MDS-encoded files in the helpers' caches to minimize 
the total average delay of all users. 

With fountain/MDS codes, user k should receive B coded 
symbols of file n in the union of the caches of transmitters 
in JV{k). If the first j < |A/'(/c)| transmitters in set A/'(/c) are 
enough for the recovery of some file n while user k cannot 



recover the file by just receiving the parity symbols of the 
first j — I transmitters, after normalizing the number of coded 
symbols to the common file size B, the delay for user k 
because of file n is equal to: 



or 



E 



^hPhn 



(1 



E 



Phnj^j 



he{h'l, 



(8) 



where phn is the amount of coded symbols for file n in the 
cache of helper h. Clearly pssn is equal to 1 for all n. (1 — 
^] phn) is the amount of parity symbols that 
should be transmitted by the jth transmitter in J\f{k). The 
above expression can be used if ^y^^^^y^k y^k j phn < 1 and 
h^} Phn ^ 1- The matrix whose element in the h- 
th row and n-th column is p = [phn], and it is referred to 
as the placement matrix. Because of the normalization, all the 
elements of placement matrix are in the range [0, 1]. 

The delay incurred by user k because of downloading 
file n is a piecewise-defined affine function of the elements of 
the placement matrix p, 



Ph^n ^ 1 

Phn < 1, 



D 



n,\N{k)\ 



^heN{k)\BS Phn < 1- 

We have the following result: 

Lemma 4: is a convex function of p. 

Proof: A function that is pointwise maximum of finite 
number of affine functions is convex (15). Now we show that: 



DV 



max JJ. '-^ . 

jG{l,2,...,|Ar(/c)|} 



(9) 



This means that if for some given j, X]/^^|^fc ^fc j phn 
and T.he{h^,,...,h^}Phn > h and therefore = D^^ 
have D'^'-^ > L^^'V i ^ j. Thus, from ([s]), we should show 
that: 



< 1 



we 



E 
E 



^hPh 



^hPhn 




Y Phn 



Phn 1 (10) 



where the first and second expressions in the above equation 
are respectively D^'^ and D'^'^ . We only discuss the case i > 
j since the proof is similar for the case i < j. After some 
manipulations, the above expression for i > j can be written 
as: 
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Phn 

^he{h^,...,hH 



(11) 



TABLE II 

Simulation parameters based on LTE specifications. 



E 



he{h^^,,...,h^_,} 



Since T.he{h^,,...,h^}Phn > 1 and ujf > cj^for h e 
{hi, ... ^ h^_i}, the above expression is always true. ■ 
The average delay of user k can be written as: 



AT 



(12) 



where Pn is the probability of requesting file n. From ([12]) 
and (|9]), the expected delay of all users is: 



K N 



max ' 

\N{k)\] 



k=l n=l 

Thus, the placement optimization problem takes on the form: 

K N 

minmuze 

k=l n=l 
N 

subject to ^ Phn ^ y h 

n=l 

< Phn < 1, V/i,n, 



7^ 1 ^ 1 J ^ I 



max iJ,'-^ 

2,..., mm " 



(13) 



where the optimization is with respect to p. The first constraint 
indicates the cache capacity of each helper is equal to M files. 
The second constraint shows the amount of parity symbols of 
file n cached by a helper h is normalized to the size of files. 
The objective function in ([13]) is convex function since it is 
positive weighted sum of convex piecewise linear functions 
ITSl . Hence, we have minimization of convex function over 
linear constraints which can be solved using standard con- 
vex optimization. The number of involved variables in the 
optimization problem is quite large. However, some tricks 
and approximations (like bundling of video files with similar 
properties into groups that form the basis of optimization) can 
alleviate these problems. Furthermore, the optimization can be 
solved in a distributed way |16|. 

We also can solve the placement optimization problem by 
converting it to a linear program. To do this we introduce new 
variables and new constraints as follows: 



K N 







n=l 








sl'^>0 VA;,nandi G {1,2,. 




N 




J2phn<M, V /l 

n=l 




0<Phn<l, \fh,n, 


(14) 



minimize 
subject to 

subject to 



where = max^^ji 2,. 
with respect to p, f^s and s^'-^s. 



\Af{k)\} and the optimization is 



Parameter of the System 


Value Assigned 


Usable System badnwidth 


18 MHz 


Sub-carrier Bandwidth 


15 KHz 


Parameter Nt 


7 OFDM symbs. = 0.5ms 


Parameter Nf 


12 Sub-carriers = ISOKHz 


Smallest Resource allocation slot 


Nt = 7 OFDM symbs. x 
Nf = 12 sub-carriers. 


Coherence time 


100ms = 1400 OFDM symbs. 
= S X Nt OFDM symbs. 
across time 


Parameter S 


^ = 200 slots (Nt OFDM 
symbs. each) 




Actual resource allocation block 


NtxNfXS = 16800 OFDM 
symbs. 


Frequency blocks for allocation 


100 



V. Experimental evaluation 

In this section, evaluation of the coded and the uncoded 
cache placement optimizations through simulation in a cell 
based on 3 GPP LTE Release 8 is presented. We assume a 
macro BS operating with a conventional proportional fair 
scheduling policy and helpers with some storage capacity 
serving users using WiFi-like links. We simulate the system 
with realistic user request pattern and evaluate the benefits 
of the femtocaching architecture in the uncoded and coded 
cache placement case, in terms of the achieved average file 
downloading delay. 

We assume a single BS serving a circular- shaped cell with 
radius 400m (typical of urban macro cell p7| ) and random 
statistically independent user positions, uniformly distributed 
over the cell. The pathloss function (in dB) is taken to be: 

{ 38 + 20*logio(c^(^,v)) ,d{u,v)< 40 
38 + 20*logio(40) + 
35 * logio {d{u, v)/40) , d{u, v) > 40 

(15) 

where d{u,v) denotes the distance, in meters, between the 
transmitter located at u and the receiver located at v where 
u,v G R^. Let the ratio between the BS signal power and the 
noise power spectral density at the transmitter and receiver be 
g{u,v) and Gq respectively. Then g{u,v) is given by, 

lOlogio(^K^)) = lOlog,^{Go) -PL{d{u,v)). (16) 

In a multi-cell scenario, users experience about — IdB to —4 
dB of SINR at the cell edge, for a system with universal 
frequency reuse (frequency reuse 1) ISj. Hence, we fix Go 
such that received SNR at the cell edge is equal to dB. 

We assume that the wireless channel is frequency selective 
and make a standard block-fading approximation of the small- 
scale Rayleigh fading, such that the fading channel coefficient 
is constant over a block of a certain duration in time (co- 
herence time of Nt X S OFDM symbols) and over a certain 
bandwidth in frequency (coherence bandwidth of Nf sub- 
carriers). Furthermore, we assume that the fading coefficients 
are i.i.d. from block to block, and independent across the users. 
We assume an OFDM TDMA cell system closely inspired by 
3GPP LTE release 8 |18|. Figure |5] shows the time-frequency 
structure and the resource allocation slot (in yellow) of the 
downlink channel. More details are given in Table |ll| 
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Fig. 5. A pictorial representation of the time-frequency allocation slots in 
OFDM/TDMA. 



The assumed signal model between the BS (located at the 
origin) and user k, located Sit u^, over time-frequency slot 
(t, /), with t = 0, 1, 2, . . . and / G {1, . . . , F}, is given by 



Ykit, f) = V^K,0)i/fe,o(t, /)xo(t, /) + Zk{t, /), (17) 

where Hk^o{tj f) ~ CA/'(0, 1) is the Rayleigh fading coeffi- 
cient on slot (t,/), and xo(t,/) G C^, y/c(t,/) G and 
z/c(t, /) G denote the transmit, received and noise vectors, 
with z/c(t, /) ~ CA/'(0, 1), independent across time-frequency. 
The achievable rate of user k on slot (t, /) is given by 

Rk{t, f) = log (1 + g{uk, 0)|i7fc,o(t, . (18) 



A. Scheduling Policy used 

A UT is said to be "idle" if it has no active download, 
and "active" otherwise. Active UTs cannot place more than 
one request at a time. If they do, all the previous requests are 
dropped. This models the fact that very closely spaced requests 
reflect a change of interest in the users. 

We assume Proportional Fairness Scheduling (PFS), cur- 
rently used in Ev-Do and HSDPA high-data rate downlink 
schemes in 3G 1 18], that assigns exactly one user to each time- 
frequency slot (TDMA/FDMA). Under the infinite backlog 
assumption, for all users the policy ensures that the system op- 
erates at the proportional fairness point of the system ergodic 
rate region [19] |20|. In our system, the PFS policy is applied 
to active users only. Let i^on(^) denote active users at time t. 
Assuming that i^on(^) oscillates a little for a large number of 
users, it approximates an infinitely backlogged scenario with 
Kon{t) active users. When a user becomes idle, it is eliminated 
from the scheduling policy. Thus, at each slot time t, for each 
frequency slot /, the users k{t^ f) G {1, . . . , K} are chosen 
subject to the TDMA/FDMA constraint, by maximizing. 



Konit) F 

k=l f=l 



(19) 



where {Qk{t)} are the scheduling weights obtained dynami- 
cally by an update scheme based on virtual queues to optimize 



the PFS utility function, and where Rk{tj f) is defined in ( 18 ) 
The reader is referred to [19J [20J for more details. 



B. Trace based User Requests 

Since the advantage of the femtocaching architecture over 
the conventional cellular (BS only) case depends very strongly 
on how "peaky" the popularity distribution is, we can make 
the femtocaching gain be arbitrarily large just by assuming a 
suitable popularity distribution. Therefore, instead of choos- 
ing an artificial distribution, we used a measured trace of 
YouTube requests from a study conducted on the University 
of Massachusetts' Amherst campus in 2008 f9l fTO]. The 
study records YouTube requests arising from the wired campus 
network for several days; we assume that there is no significant 
difference in the user behavior to downloads with wireless 
devices. To be more specific, we target a scenario where the 
BS is swamped with YouTube-like (short videos) requests 
and consider the average downloading delay as the metric 
of interest. We assume that all requests are of size 30 MB, 
which is reasonable assuming a screen size 640 x 360, flash 
encoding and a few minutes (^3 — 4 min) playback time. 
The simulations, analysis and conclusions also hold (in a 
scaled form) for larger file sizes. All traces used in simulations 
contain unique users (based on unique IP addresses from trace 
data) who request files during the time period. For every user, 
a time series involving the exact time of request (in seconds) 
is created and the corresponding video file number (can be 
distinguished from the trace data) is created. The user requests 
are simulated exactly as per these traces. 

We use the trace data fT\\ for the day 02/19/08. There are 
a few thousand unique users for the entire day. We choose the 
busiest four hours of the day, which accounts for 848 unique 
users and about 4600 requests, and form trace 1. We assume 
we know a priori the popularity distribution (number of views 
vs rank of videos in terms of views) for the day from the data 
of the entire day. This is used for assigning popularity to the 
files requested. This presupposes sophisticated algorithms that 
could predict the popularity distribution from the past activity 
in the cell; for a discussion of this assumption see Sec. II. 

We derive another trace by superposing the two busiest four 
hours and creating a merged trace containing 1719 (unique 
user lists from each four hour period are added up) users and 
about 9600 requests. This is in line with the recent predictions 
of the rise in video requests in mobile traffic in the next few 
years. We call this merged trace as trace!. 

The scheduling and request handling take place as per the 
system specifications in the previous subsections. We compare 
the performance of three systems. 

1) Baseline system- Only the BS serves the users. 

2) Uncoded Helper system 

3) Coded Helper system 

The helpers are placed uniformly in a square grid spanning 
the cellular region. We assume a simple model for the helper- 
user communication. Every helper has a range of 100m. The 
rate dies down from 30 Mbps at 10m to 3 Mbps at 100m 
and decay curve follows the curve for case of 2x2 MIMO 
and 20 MHz in Fig 5.9 of p2| but scaled down by a factor 
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of 4 accounting for overheads and the fact that this rate is 
guaranteed irrespective of other users who may request from 
the same helper concurrently. Although simplified, the model 
can be justified by facts about the latest WiFi standard quoted 
| [23| . When a user requests a video file, and if the file lies 
in part or entirely in any of the helpers within range of the 
user, then the BS directs a suitable helper, depending on the 
distance from the user, to serve the user with the appropriate 
fraction of the request and the user gets it from the helper. This 
helper-user communication model supplies average delay, i.e 
uj^ as in ([s]), from helper h to user k based on distance as 
inputs to the coded and uncoded placement algorithms. 

The inputs cj^^, as in ([t]), to the uncoded and the coded 
algorithm are the time averaged download delays for the 
files downloaded by user k from the base station. Generating 
these inputs to the algorithms poses a problem. The actual 
average delay experienced by user k, from the BS, cannot be 
known prior to the placement of files. Since the BS uses a 
scheduling algorithm, the delays are determined by the exact 
traffic pattern. Traffic pattern at the BS changes when helpers 
and placement of files are introduced. Further, it also requires 
knowing the exact traffic pattern from the traces before place- 
ment. In essence, it can be determined only after the placement 
is made and only knowing the exact traffic pattern through 
simulations. We make the following observation to circumvent 
the problem. In essence, both the algorithms described try to 
store the most popular files, but with more sophistication. So, 
average download delays from BS are computed by simulation 
done when BS serves users along with the same number of 
helpers, as in the optimization, but storing the most popular 
files (M most popular files where M is the cache capacity). 
Since exact traffic pattern information cannot be used, we 
use an analytical model for user requests for this simulation. 
In this model, every user behaves as follows: users when 
downloading do not make any additional request at all. The 
wait time between two requests is a random variable with a 
geometric distribution with parameter p f24l . The probability 
of success (probability of requesting when idle) is p every 
second. The expected wait time is 1/p. p can be chosen to 
match the traffic intensity in the trace (No of requests and No 
of simultaneous active users per hour). This analytical model 
captures the 'realistic' coupling between the current service 
and future requests. That is, users who have poor service 
would request less frequently. Note, this does not use the exact 
user request pattern of the traces which means the inputs to 
the placement algorithm only needs some rough estimates of 
traffic intensity for choice of the parameter p. We have chosen 
p = 0.003 and 300 users drawing requests i.i.d from the 
popularity distribution to generate the average delay values 
for the BS for our simulations. This choice of p represents 
reasonably high traffic intensity, i.e a significant fraction of 
the total number (300) of users being simultaneoulsy active. 

To compare the baseline and the helper systems, we define 
for every user, the Quality of Service (QoS) as the average 
download delay. A streaming user is satisfied if its average 
downloading delay is less than its playback time. This means 
that by allowing some pre-buffering and thanks to the playback 
buffer which absorbs the statistical fluctuations, the video can 



be watched without stall till the end. In all simulations, H 
represents the number of helpers, M represents the number of 
30 MB files that a helper can cache. In some plots, storage 
capacity in GB is mentioned and M can be inferred from it. 
QoS is chosen to be 100 seconds and 200 seconds to measure 
satisfied users. Such a QoS is reasonable for a playback time 
of 3 minutes for the videos. 

In general, we observe that the Greedy Helper system is 
strictly better than the Popular Helper System, which in turn 
is better than the Baseline. The number of satisfied users for 
the Coded and Uncoded Helper systems are very close. This 
indicates that helpers can store the entire files without using 
the fountain code and we still have a significant gain. 

Figures [6] and [7] show the number of satisfied users versus 
the storage capacity of helpers. Figure [6] uses trace 1 while 
Figure [7] uses trace2. As expected, increasing the storage 
capacity of each helper enhances the number of satisfied users. 
The difference between the Baseline system and the Helper 
systems in trace2 is larger than that in trace 1. This indicates 
the significant advantages of helpers, in particular, in case 
of a large number of video requests. Thus, in order to have 
reasonable performance for large number of video requests, 
the use of helpers is more crucial. 

Figures [8] and [9] show the number of satisfied users versus 
QoS for trace 1 and trace2, respectively, with H = 45 and 
M = 2000. In Figure [8j the number of satisfied users increases 
more than 400% in both helper systems. Better improvement 
is seen in (Figure l9]). 



Figures 10 and [TT] show the number of satisfied users versus 
the number of helpers for trace 1 and trace2. Increasing the 
number of helpers increases the number of satisfied users. 

Let us recall that some requests are dropped if a user 
requests too frequently. We now compare the systems on the 
basis of successful requests (requests that are not dropped). 



Figures [12] and [13] show the percentage of successful requests 
versus the download time for trace 1 and trace2, respectively, 
with H = 32 and M = 3000. It can be seen that even for 
large download times, the percentage of successful requests 
does not reach 100%. The reason is that, in trace data, most 
users become active for a short time in a day and within 
that short time they request several video files consecutively. 
They usually request another file before the previous one 
is downloaded completely. Since the requests which are not 
downloaded completely are considered as dropped requests, 
dropped requests always exist. However, as it can be seen from 
the plots, helpers reduce the percentage of dropped requests. 
So, by having helpers, not only do we see an increase in the 
number of satisfied users, but also we see a reduction in the 
number of dropped requests. 

VI. Conclusion 

In this paper we introduced a new method for increasing 
the throughput of wireless video delivery networks. The key 
idea is the use of a distributed cache, i.e., helper stations that 
store the most popular video files, and transmit them, upon 
request, via short-range wireless links to the user terminals. 
The caches are low-cost because storage capacity has become 
exceptionally cheap (according to recent prices, two terabytes 
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cost approximately 100 dollars), while the loading of the files 
to the caches can occur through a low-rate (and thus cheap and 
robust) backhaul links at low demand times. We then formu- 
lated and solved the problem of which files should be assigned 
to which helpers. Our numerical evaluation uses a realistic 
LTE-based cellular simulator and a real trace of YouTube 
requests and demonstrates performance improvements on the 
order of 400 — 500% more users at reasonable QoS levels. Our 
conclusion is that a wireless distributed helper system seems to 
be a promising way of alleviating the bottlenecks in wireless 
video delivery. 

Appendix 

Matroids: Matroids are structures that generalize this con- 
cept of independence, as known from linear algebra, to general 
sets. Informally, we need a finite ground set S and a matroid is 
a way to label some subsets of S as "independent". In vector 
spaces, the ground set is a set of vectors, and subsets are 



called independent if their vectors are linearly independent, in 
the usual linear algebraic sense. Formally, we have |25|: 
Definition 1: A matroid is a tuple M = {S^I), where 
is a finite ground set and X C 2^ (the power set of S) is a 
collection of independent sets, such that: 

1 . X is nonempty, in particular, G X, 

2. X is downward closed; i.e., if Y ^ X and X C Y, then 

X eX, 

3. If X,Y e X, and \X\ < then 3y e Y\X such that 
XU{y}eX. □ 

One example is the partition matroid. In a partition matroid, 
the ground set S is partitioned into (disjoint) sets Si; S2; Si 
and 

X = {X C S : \X n Si\ < ki for Sill i = 1 .. .1}, (20) 
for some given parameters /ci , /c2 , . . . , /c^ 
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Submodular functions: Let 5* be a finite ground set. A set 
function / : 2*^ ^ R is submodular if for all sets A^B C 



f{A)^f{B)>f{AuB)^f{AnB). 



(21) 



Equivalently, submodularity can be defined by the following 
condition. Let = /(^ + ~ /(^) denote the marginal 

value of an element i e S with respect to a subset ACS. 
Then, / is submodular if for ail A C B C S and for all 
i G S\B we have: 



(22) 



Intuitively, submodular functions capture the concept of di- 
minishing returns: as the set becomes larger the benefit of 
adding a new element to the set will decrease. Submodular 
functions can also be regarded as functions on the boolean 
hypercube {0, l}!*^! ^ R. Every set has an equivalent boolean 
representation by assigning 1 to the elements in the set and 
to other ones. We denote the boolean representation of a set 
X by a vector e {O,!}!*^!. 

The function / is monotone if for A C 5 C S, we have 

fiA) < m. 
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